Time Series Homework: Chapter 5 Lesson 1

Tristan Farrow

Revision

I updated everything that was in the key but left everything that wasn’t.

Data

## Selected Walmart Stock Price

# Set symbol and date range
symbol <- "WMT"
company <- "Walmart"
date_start <- "2015-11-01"
date_end <- "2020-01-01"

# Fetch stock prices (can be used to get new data)
stock_df <- tq_get(symbol, from = date_start, to = date_end, get = "stock.prices")

# Transform data into tsibble
stock_ts <- stock_df %>%
  mutate(
    dates = date, 
    value = adjusted
  ) %>%
  dplyr::select(dates, value) %>%
  as_tibble() %>% 
  arrange(dates) |>
  mutate(
    diff = value - lag(value),
    index = row_number()
    ) |>
  as_tsibble(index = index, key = NULL)

# Visualization

plot_ly(stock_ts, x = ~dates, y = ~value, type = 'scatter', mode = 'lines') %>%
  layout(
    xaxis = list(title = paste0("Dates (", format(ymd(date_start), "%d/%m/%Y"), " to ", format(ymd(date_end), "%d/%m/%Y"), ")" ) ),
    yaxis = list(title = "Closing Price (US$)"),
    title = paste0("Time Plot of ", symbol, " Daily Closing Price")
  )
## Women's Clothing Retail Sales

retail_ts <- rio::import("https://raw.githubusercontent.com/TBrost/BYUI-Timeseries-Drafts/refs/heads/master/data/retail_by_business_type.csv") |>
  filter(naics == 44812) |>
  mutate(month_seq = 1:n()) |>
  mutate(month = ym(month),
         year = year(month)) |>
  mutate(month_num = month(month)) |>
  filter(month >= ym("2004 Jan") & month <= ym("2006 Dec"))|>
  as_tsibble(index = month)

Questions

Question 1 - Key Definitions (10 points)

Answer the prompt to the learning outcome below. Include any mathematical expressions or illustrations that may accompany the definitions and ideas if available.

Answer
  • Define a linear time series model A model where the time series value at time t, \(x_{t}\), is expressed as a linear combination of predictor plus an error term.

  • Represent seasonal factors in a regression model using indicator variables A seasonal factor is represented using a indicator variable but there are two method depending on if there is an intercept or not. Without an intercept uses an average of each leg of the season plus the trend value at time t. With an intercept it uses a reference month and the average distance from that month to the other months as a seasonal adjustment.

  • State how to remove a polynomial trend of order m To remove a polynomial trend of order m, m-th order differencing can be applied. This transforms a deterministic trend into a stochastic series.

Question 3 - Linear model with additive seasonal indicator variables (40 points)

a) Use OLS to estimate a linear model with linear trend and additive seasonal indicator variables of the Women’s Clothing Retail Sales data set. Please report the estimates for the monthly seasonal indicator variables in a professionally formatted table. (See an example HERE)
Answer
retail_ts <- retail_ts |>
  mutate(stats_time = year + (month_num-1)/12,
         month_ = factor(month_num))

dat_lm <- retail_ts|>
  model(lm = TSLM(sales_millions ~ 0+stats_time + month_))

tidy(dat_lm) 
# A tibble: 13 × 6
   .model term       estimate std.error statistic  p.value
   <chr>  <chr>         <dbl>     <dbl>     <dbl>    <dbl>
 1 lm     stats_time     161.      12.0      13.4 2.41e-12
 2 lm     month_1    -319679.   24053.      -13.3 2.80e-12
 3 lm     month_2    -319609.   24054.      -13.3 2.82e-12
 4 lm     month_3    -319009.   24055.      -13.3 2.93e-12
 5 lm     month_4    -318900.   24056.      -13.3 2.95e-12
 6 lm     month_5    -318937.   24057.      -13.3 2.95e-12
 7 lm     month_6    -319118.   24058.      -13.3 2.92e-12
 8 lm     month_7    -319380.   24059.      -13.3 2.87e-12
 9 lm     month_8    -319322.   24060.      -13.3 2.89e-12
10 lm     month_9    -319193.   24061.      -13.3 2.91e-12
11 lm     month_10   -319071.   24062.      -13.3 2.94e-12
12 lm     month_11   -318905.   24063.      -13.3 2.97e-12
13 lm     month_12   -317451.   24064.      -13.2 3.27e-12
b) Please interpret the coefficient you estimated for the month of January
Answer

In a model predicting sales by month, the monthly coefficients are supposed to show expected sales for each month at time zero (the starting point). But if your data doesn’t actually include that starting point those coefficients become meaningless on their own. To make them useful, compare them to the average sales from your first year of actual data, while also accounting for the overall trend over time. This way, you can understand the monthly patterns within the realistic timeframe of what you actually observed, rather than trying to interpret numbers from a hypothetical period you don’t have data for.

c) Suppose that instead of estimating a model with an intercept of zero, you let the model estimate an intercept. What would be the interpretation of the intercept estimate?
Answer

The intercept uses a base month with the coefficients being the difference between that month and the base month. This can useful when interpreting the coefficients because postive values means that that month averages more than the base month and less when the coefficient is negative.

d) Please make a five year forecast using the model you estimated in Part a. Use 95% confidence bands.
Answer
num_years_to_forecast <- 5
df <- data.frame(
  month_ = factor(1:12), 
  estimate = tidy(dat_lm) |> slice(2:13) |> pull(estimate)  
  
)
num_years_to_forecast <- 5
num_months_to_forecast <- num_years_to_forecast * 12


last_time <- max(retail_ts$stats_time)
last_month <- retail_ts$month_num[which.max(retail_ts$stats_time)]


new_dat <- tibble(
  stats_time = seq(from = last_time + 1/12, by = 1/12, length.out = num_months_to_forecast),
  
    alpha = tidy(dat_lm) |> slice(1) |> pull(estimate),
  month_num = rep(1:12, times = num_years_to_forecast)
) |>
  mutate(month_ = factor(month_num, levels = levels(retail_ts$month_))) |>
  left_join(df, by = "month_")


retail_forecast <- dat_lm |>
  forecast(new_data = as_tsibble(new_dat, index = stats_time))


retail_forecast |>
  autoplot(retail_ts, level = 95) +
  labs(
    title = "Retail Sales Forecast",
    subtitle = "5-Year",
    y = "Sales ($ Millions)",
    x = "Time (Numeric)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )

Rubric

Mastery (10) Incomplete (0)

Question 1: Definitions

The student correctly defined each of the terms and included mathematical expressions or illustration if available in the text or the Time Series Notebook The student did not provide a correct definition for one or more of the terms.
Mastery (10) Incomplete (0)

Question 2a: OLS linear trend

Students estimate the linear model using OLS and provide well-commented code. Results are presented clearly in a professionally formatted table. Students struggle to estimate the linear model using OLS or provide poorly commented code. Results may be unclear or inaccurately presented in the table format.
Mastery (5) Incomplete (0)

Question 2b: Autocorrelation plots

Students create clear plots with appropriate labeling and provide well-commented code. Plots have insufficient clarity, labeling, or code comments, hindering the analysis of autocorrelation.
Mastery (10) Incomplete (0)

Question 2c: Residual AR(p) modeling

Students fit residuals appropriately, selecting order based on correlogram and partial correlogram. They also include statistical evidence using R statistical tests of AR(p) model fit. They provide well-commented code and present their results clearly Submissions struggle to fit residuals or select the order of autoregressive model using plots and statistical evidence
Mastery (15) Incomplete (0)

Question 2d: GLS linear trend AR(p) errors

Students accurately estimate the linear model using GLS using their results in part c. Results are presented clearly in a professionally formatted table that includes a comparison of the GLS and OLS point estimates, standard errors, and confidence intervals. Submissions don’t implement the GLS algorithm correctly. Students don’t display the results professionally, or they don’t include a comparison to OLS results.
Mastery (15) Incomplete (0)

Question 2e: Autocorrelation Bias

Students provide clear analysis of autocorrelation bias and its forecasting implications. They point out the connection between standard errors and forecasting confidence bands. Students may provide incomplete or inaccurate analysis of autocorrelation bias or its forecasting implications, lacking clarity or depth in discussion of its importance.
Mastery (10) Incomplete (0)

Question 3a: OLS additive seasonal indicator variables

Students accurately estimate the linear model using OLS, including seasonal indicator variables, and provide well-commented code. Results are presented clearly in a professionally formatted table. Students struggle to estimate the linear model using OLS or provide poorly commented code. Results may be unclear or inaccurately presented in the table format.
Mastery (10) Incomplete (0)

Question 3b: Coefficient interpretation

Students provide a correct interpretation of the coefficient for January (including the correct units). and relate to the effect on the Women’s Clothing Retail Sales. Interpretation of the coefficient for January is incomplete, inaccurate, or unclear, lacking a direct connection to its effect on the Women’s Clothing Retail Sales.
Mastery (10) Incomplete (0)

Question 3c: Perfect Colinearity

Students provide a clear interpretation of the intercept estimate in the context of the Women’s Clothing Retail Sales data, considering how it relates to the additive seasonal indicator variables Interpretation of the intercept estimate may be incomplete, inaccurate, or unclear. It doesn’t make clear the perfect colinearity problem and the correct interpretation of the dropped variable.
Mastery (10) Incomplete (0)

Question 4d: Forecast

Students accurately make the five-year forecast using the estimated model, including 95% confidence bands in their plot. Students encounter difficulties in making the five-year forecast or don’t include the forecast plot. Code may be poorly commented or the inclusion of confidence bands may be omitted.




Total Points 105